API Examples

Author: Guorong Xu

2016-09-19

The notebook is an example that tells you how to calculate correlation, annotate gene clusters and generate JSON files on AWS.

Notice: Please open the notebook under /notebooks/BasicCFNClusterSetup.ipynb to install CFNCluster package on your Jupyter-notebook server before running the notebook.

1. Configure AWS key pair, data location on S3 and the project information



In [ ]:

    
import os
import sys

sys.path.append(os.getcwd().replace("notebooks", "cfncluster"))

## S3 input and output address.
s3_input_files_address = "s3://path/to/input folder"
s3_output_files_address = "s3://path/to/output folder"

## CFNCluster name
your_cluster_name = "testonco"

## The private key pair for accessing cluster.
private_key = "/path/to/private_key.pem"

## If delete cfncluster after job is done.
delete_cfncluster = False

2. Create CFNCluster

Notice: The CFNCluster package can be only installed on Linux box which supports pip installation.



In [ ]:

    
import CFNClusterManager, ConnectionManager

## Create a new cluster
master_ip_address = CFNClusterManager.create_cfn_cluster(cluster_name=your_cluster_name)
ssh_client = ConnectionManager.connect_master(hostname=master_ip_address,
               username="ec2-user",
               private_key_file=private_key)

After you verified the project information, you can execute the pipeline. When the job is done, you will see the log infomration returned from the cluster.

Checking the disease names



In [ ]:

    
import PipelineManager

## You can call this function to check the disease names included in the annotation.
PipelineManager.check_disease_name()

## Define the disease name from the below list of disease names.
disease_name = "BreastCancer"

Run the pipeline with the specific operation.



In [ ]:

    
import PipelineManager
    
## define operation
## calculate: calculate correlation;"
## oslom_cluster: clustering the gene moudules;"
## print_oslom_cluster_json: print json files;"
## all: run all operations;"

operation = "all" 

## run the pipeline
PipelineManager.run_analysis(ssh_client, disease_name, operation, s3_input_files_address, s3_output_files_address)

To delete the cluster, you just need to set the cluster name and call the below function.



In [ ]:

    
import CFNClusterManager

if delete_cfncluster == True:
    CFNClusterManager.delete_cfn_cluster(cluster_name=your_cluster_name)